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1 Introduction 



In this paper, we present a calculation of the variance of the number of 
comparisons required by the Quicksort algorithm for sorting a set, when 
the pivot is chosen uniformly and at random from the n objects {xi, . . . x„} 
(which have a total order on them, but not one initially known to us) to be 
sorted. Remember that, given a pivot Xi, the Quicksort proceeds by carrying 
out pairwise comparisons (which we assume can be done) of all the other 
objects with Xj, and using this to split the original set into two subsets, all 
those elements above the pivot and all those below it. We then iterate this 
process, choosing pivots in each smaller set uniformly at random and using 
comparisons with the pivot to split each set into two others. Eventually we 
will have all the elements in order and the algorithm terminates. The object 
of interest is the number C„ of comparisons required to get the n elements in 
order. If pivots in each set are chosen from all elements in the set uniformly 
at random, C„ is clearly a random variable. It is well-known that the mean 
Mn of Cn is equal to 2(n + l)iJ„ - An, where if„ = 1 + 1/2 + 1/3 + . . . 1/n is 
the nth harmonic number. (Note that Hq = 0). For proofs of this fact, see 
[1], |2]. We also define the ?7,th harmonic number of order k to be equal to 
Hi^'> = 1 + 1/2'' + 1/3'= + . . . 1/n^ 

In this paper, we obtain the variance of C„. The formula for this is stated 
without proof in Knuth [3], who in his Exercise 6.2.2-8 states the formula 

Var{Cn) = 7n^ - 4(n + l^Hj^^^ - 2{n + l)Hn + 13n. 

Similarly, the papers [1] and [5] provide sketches of how to prove this fact. 
Also, in [6] the asymptotic variance of the random variable 

n+1 

is obtained using results about moments of 'the depth of insertion' in a 
tree and some martingale arguments. However we are not aware of any 
source where all details of the argument are written out explicitly with as 
few prerequisites as possible. Thus we felt it would be desirable to provide 
such an account, though we freely acknowledge that not all the details of the 
computation are particularly interesting. No originality is claimed for the 
result. 

The basic strategy of the argument is to use a sequence of reductions of 
the problem. We first use generating functions to show that it is sufficient 
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to prove that a certain sequence B^, defined to the next section is equal to 
= 2{n + If {Hi - h!^^) - H^{n + l)(8n + 2) + ?^!^!±1!!^. 

The proof of this in turn rehes on various identities involving harmonic num- 
bers and much manipulative algebra - readers may prefer to use MAPLE at 
some stages (as we did ourselves to initially find the relationships, though we 
do include proofs for completeness). 



2 Proof 

The theorem that we will prove at this paper is 

Theorem 2.1 If Cn is the number of comparisons used by Quicksort with a 
pivot chosen uniformly at random, then 

Var(Cn) = 7n'^ - 4(n + ifH^^^ - 2{n + + 13n. 

We start with a recurrence for the generating function of C„, namely fn{z) = 
Ylk=o'^^^^ -P(C'n = k)z''. We will use this to reduce the proof of the theorem 
to proving a certain recurrence formula for a quantity to be called B^ (defined 
below) . 

Theorem 2.2 In Random Quicksort of n objects, the generating functions 
fi satisfy 

fn{z) = ^fj-i{z)fn-j{z). 

^ j=l 

Proof. Using the following equation 

Cn = Cu„-1 + Cn-Un + n — 1 

we have that 

1 " 

P{Cn = k) = -J2 PiCn = k\Un = m) 

m=l 

n fc-(n-l) 

= E P{Cm-l^j)P{Cn-m^k-in-l)-j). 

^ 1-1 

m=l j=l 
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(We are using here the fact that C^_i and Cn-m are independent). Thus 

n fc-(n-l) 

P(C„ = = - J] Yl P{Cm-i^3)z^P{Cr,-m^k-{n-l)-j)z^-^^-'^-h^-\ 

m=l j=l 

Multiplying by z'' and summing over k, so as to get the generating function 
fn of Cn on the left, we obtain 



n-l+j n k—(n—l) 

fc = l 171=1 j = l 

n-l n k-{n-l) n-l+j 



m=l j=l fe=l 



Thus 

z 

n 



n-l " 



/n(^) = E fm-l{z)fn-m{z) (*) 

m=l 

as required. • 

This of course leads to a recursion for the variance, using the well-known 
link between variance of a random variable X and its generating function 
fxiz): 

yar(X) = /^(l) + /^(l)-(/^(l)f. 
We use this formula together with equation (*) above. For the first derivative. 



n — l)z'"' - ^ z'^ ^ ^^'T' 



4 



From standard properties of generating functions, E{Cn) = f'ni^)- Differen- 
tiating again we obtain 

^ J=l ^ j=l 

n— 1 n— 1 " 

i=i 3=1 
Setting 2; = 1, we have (see [1]) 

n 2 " 

= (n - l)(n - 2) + -{n - 1) ^ M,_i + -(n - 1) ^ M„_, 

n n 

i=i i=i 

where Mj_i, M„_j are f^_j{l), i.e. the mean number of comparisons 

to sort a set of (j — 1) & (n—j) elements respectively. Setting i?„ = /^(l)/2, 
we obtain 

2B„ = (n - l)(n - 2) + M^_, + ^^^^ V M„_, + - V(25,_i + 2i?„_,) 

j=i j=i j=i 

2 " 

i=i 

But now, noting that J2j=i^j-i = X]j'=i^»i-i5 both sums are Mi + 
. . . + M„_i (using the fact that Mq = 0), and similarly that ^"=1 -Bj-i = 
^J=i Bn-j, we see this is 

fn-l\ 2{n-l)^ 2 >A 1 >A 

i?„= 2 +^^E^^-i + -E^^-i + -E^^-i^-^- (1) 

V / j=l j=i j=i 
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What this argument has shown for us is the following - compare [5] where it 
is also shown that this recurrence has to be solved, though no details of how 
to solve it are given. 

Theorem 2.3 In order to prove Theorem 2.1, it is sufficient to show that 
the recurrence equation (1) for Bn is satisfied by 

B„ = 2(71 + If Hi - (Sn + 2)in + 1)H^ + - 2(n + l^Hi'l 

Proof. If we get this formula, we then have 
Var(C„) = + /^(l) - = 2i?„ + 2{n + l)H^ -An- [2{n + - Anf 

= 4(n + If Hi - 2{8n + 2)(n + + s^^i^^ILll!) _ + 
+ 2{n + l)Hrr -An- [2{n + l)Hn - Anf 

= A{n + if Hi - 2(8n + 2){n + l)Hn + n(23n + 17) - A{n + ifH^f^ 
+ 2{n + l)Hn -An- A{n + if Hi + lQn{n + - IQn^ 
= 7n^ + 13n - A{n + ifHf^ - 2{n + l)ifn[(8n + 2) - 1 - 8n] 
= 7n^ - 4(n + ifHf^ - 2{n + l)Hn + 13n 

as required. • 



3 Solution of the recurrence for B 



n 



We have to solve the recurrence. For the sum of the Mj_i, the expected 
numbers of comparisons, we have 

n n n n 

= E[2^'^^-i - - 1)] = 2E^'^^-i - - 

j=l j=l j=l j=l 

For the computation of the first sum, a Lemma follows. 

Lemma 3.1 For n G N 

-u n{n+l)Hn+i n{n + h) , , . i ^ ^\TT hn^ + n 

Z^J^i-i = 2 4 Z^^J-i = + 2 

i=i i=i 
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Proof. By induction on n, the case n — 1 being trivial. Suppose that it 
holds for all n < k. Then for n = /c + 1 we have 

'fjH,., = ±jH,., + (* + 1)H, = M^±i)^ _ M^±i) + + 

j=i i=i 

= 2 ^ + + 

^ (A; + l)(A; + 2)gfc+i k{k + 5) ^ 
2 4 



(A;+l)(A; + 2)/ffc+2 k + 1 k{k + 5) ^ 
2 2 4 

{k + l){k + 2)Hk+2 _ k^ + 7k + 6 
2 4 

{k + l){k + 2)Hk+2 {k + l){k + 6) 
2 4 

n(n + l)Hn+i n{n + 5) 



giving the first claim. The second claim follows recalling that jy^=i(j — 
n{n- l)/2. • 

Now, we will compute the term 

n 

J2M,.^Mn-j. (2) 

We shall use three Lemmas in the proof. 
Lemma 3.2 For n eN, it holds that 

" " 8 44n 

^M,_iM„_,- ^4j2jHj-i(n-j + l)Hn-j - -n(n2 - + — (n^ - 1) 

j=l 3=1 
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Proof. To do this, we will again use the formula obtained previously for 



Mj. We have 



^ M,_iM„_,- = ^[(2ji/,_i - 4j + 4)(2(n - j + - 4n + 4j)] 



n 



j=i j=i j=i 

n n n n 

- 8 J] J - J + l)^n-i + 16n J] J - 16 J] j2 + 8 J](n - J + 

n 

- 16n^ + 16^ J. 

We need to work out the value of Yl^=ij'^^j-i- Using MAPLE initially, we 
found 

-A ^ 6n(n + l)(2n + l)i7„+i -n(n + l)(4n + 23) 

l^j ^.-1 = ; 

we will confirm this by induction. 
Lemma 3.3 For n e N holds 



i-1 



6n{n + 1) (2n + l)Hn+i - n{n + l)(4n + 23) 



36 

Proof. By induction on n, the case n — 1 yielding I^Hq = on the 
left-hand side and on the right-hand side 

36/^2 - 54 _ 36 + 18 - 54 _ 
36 ~ 36 ~ ■ 
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Suppose that the equation holds for ell n < k. For n = A; + 1, we have 

fc+l k 

^ m + l)(2k + - k(k + im + 23) ^ ^ ^^^^^ 

36 

_ 6(A; + l)Hk+i{2k^ + 7k + 6)- k{k + l){Ak + 23) - 36(A; + 1) 
~ 36 

_ 6(fc + l)Hk+i{k + 2)(2A; + 3) - (A; + l){Ak'^ + 23A; + 36) 
~ 36 

_ 6(A; + l)Hk+2{k + 2)(2A; + 3) - 6(A; + 1)(2A; + 3) - (/c + 1)(4A;2 + 23A; + 36) 

~ 36 

_ Q{k + l){k + 2)Hk+2{2k + 3) - {k + l){4:k^ + 35k + 54) 
~ 36 

_ 6{k + l){k + 2)Hk+2{2k + 3)-{k + l){k + 2){Ak + 27) 

36 

finishing the proof of Lemma 3.3. • 



We also need to compute j (w ~ J + ^)Hn-j- We have 



Lemma 3.4 For n e N 



^ . , . , , 6ni/„+i (n^ + 3n + 2) - - 27n^ - 22n 
2^ J (n-j + !)//„_,• = — . 

Proof. Wc can write j — n + 1 — {n — j + Then, substituting 
k — n — j + lwe obtain 

n n 

J^Jin -j + ^)Hn-j = + l)-{n-j + l)]{n -j + l)H„^j 

n n 

= (n + 1) ^(n - 3 + !)//„_,• - ^(n - j + ifH^-^ 

n n 
k=l k=l 
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Thus, since we know both sums by Lemmas 3.1 and 3.3 we get 

n n n 

j=l k=l k=l 

_ ^ ^^^ n(n + l)iJn+i n(n + 5) ^ 

6n{n + l)(2n + l)i/„+i - n{n + l)(4n + 23) 
36 

_ n{n + ifH^+i _ + l)(n + 5) 

~ 2 4 

6n{n + l)(2n + l)i/„+i - n{n + l)(4n + 23) 

36 

_ 18n(n + ifH^+i - 9n{n + l){n + 5) 

~ 36 

6n{n + l){2n + l)i/„+i - n{n + l)(4n + 23) 

36 

_ 6ni/„+i(n2 + 3n + 2) - n{n + l)(5n + 22) 
~ 36 

which is easily checked to be equal to the quantity in the statement above 
on expanding out. • 

We are now ready to complete the evaluation of Yl]=i Mj_iMn-j. Note 
first that X]J=i('^ ~ J + ^)Hn-j — Ylk=i ^H^-i (set k — n — j + 1) and so 
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Lemma 3.1 can be used to compute it. Pulling everything together, we have 

^ g^ 6n(n + l)(2n + l)i/„+i - n{n + l)(4n + 23) ^ ^ ^q^J^J 
6n//„+i(n2 + 3n + 2) - - 2771^ - 22n -A .3 

+ 8( ^^^ + 1^""-^' - + ) - 16n^ + 16 X: J 

= 4^jHj_i{n -j + l)H^_j - An^{n + l)H^+i + 2n\n + 5) 

+ ^[6^(^' - l)^n+i + n^-n]+ Sri'in + 1) - ^ - - — 
+ 4n(n + l)if„+i - 2n{n + 5) - 16n^ + 8n(n + 1) 

n 

- 4 ^ -j + l)Hn-j - 4n(n + l)(n - l)Hn+i 

+ - l)i?n+i H ■ 

3 ^ ^ ^ 36 36 

Thus we indeed get the conclusion of Lemma 3.2, namely that 
J2Mj.,M^.j=4j2jHj-i{n-j + l)H^_j-—{n'-l)H^+, + —{n^-l). 
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Returning back to the recurrence relation (1), we obtain from Lemmas 3.1 
and 3.2 that 

-Dn ^ \ [n[n + lj-H„+i ) 

+ ^{n -l)Hn+i + —{n -l) + -2^5i-i 

= - IK^ - 2) ^ - l)(n + - (n - l)(5n + 1) 

+ -{n -l)Hn+i + —(n -lj + -2^i^j-i- 

Finally, 

„ 4 E;=i - J + 2 ^ ^ , -9n2 + 5n + 4 

^"^ ^ + 2.^^-1 + 2 

-^(n2-l)//„+i + ^(n^-l). 
Multiplying by n, we have 

Ev^ — + 5n + 4 
- j + l)Hr,-j + 2 2^ + n 

- n^{v? - l)Hn+i + n^{v? - 1). 
For n + 1, we have similarly 
{n + l)S„+i 

2 44 
- + 1)3[(^ + 1)' - mn+2 + (n + l)-[(n + 1)2 - 1]. 
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Subtracting nB^ from {n + l)Bn+i, we obtain 
(n + l)Bn+i - nBn 



n+l 

j=i i=i 



/ ,,-9(n+ l)2 + 5(n+ 1) + 4 -Qn^ + 5n + 4 
+ 2S„ + (n + 1) — ^ ^ n 

- (n + l)^[(n + 1)2 - l]Hn+2 + ^n(n2 - 

+ (n + l)^[(rz + lf-l]-n^(n2-l) 

n n 

= 4[J2jHj-i{n - j + 2)//„+i_,- - J2jHj-i{n - j + 

j=i j=i 

27n2 + 17n 2 , 2 , 2 x 44n(n + l) 
+ 2Bn -n{n +l){n + 2)Hn+2 + ^nHn+iin^ - 1) + ^- ^ 

noting that the term ior j — n + 1 does not contribute to the sum. In the 
first sum, we use the facts that Hn+i-j — H^-j + l/(n + 1 — j) and that 
n — j-\-2 — [n — and then we get 

i=i j=i 

n n 

+ J2jHj-iHn-j+i - Y,jHj-i{n -j + l)Hn-j] + 2Bn 
i=i j=i 

27n2 + 17n 2 , ^, 2 ,2 ,x 44n(n + 1) 
-n{n + l){n + 2)Hn+-2 + -nH^+,{n^ - 1) + '-. 

The first sum on the first hne cancels with the equal sum on the second 
line, the second sum on the first line simplifies, and again using i?„+2 = 
Hn+i + l/(n + 2) on the last line, we obtain 

n n ^ 

4[J] + ^3Hj_^Hn-j+i] + 2Bn - 2n{n + l)Hn+i + -n{n + 11). 

i=i i=i 

We thus see that we have to work out the following expression: 

n 
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We note that 

n 

X^i-^i-l-^n+l-i = ^HqHu + 2HiHn~l + 3H2Hn-2 + . . . + (n — l)/J„_2-f^2 + riHn-iHi 

i=i 

n + 2 

— [HlHn-l + H2Hn-2 + • • • + if„_2-f^2 + Hn-lHi + if„ifo) 

so it suffices now to obtain the quantity Yl^j=i ^j^n-j- 

Sedgewick [7] presents and proves the following result: 
Lemma 3.5 

n 

J2 H^Kn^l~^ = {n + 2){Hl^, - H^^l) - 2{n + - 1). 

i=l 

Proof. 

n n n+l—i ^ 

i=l i=l j=l 

The set : 1 < i < n, 1 < j < n + 1 — i} is, as a picture easily shows, 

the same as : l<i<n + l— j,l<j< n}. Thus the above is 

n ^ n+l—j n ^ 

E- E ^^ = E-[(^+2-^')^n+i-,-(n+i-j)] 

To see the claim about the sum of the Hjs, we note that 

" 111 
J2h^ = H^ + H2 + ... + H^ = 1 + {I + -) + ... + {! + - + ... + -) 

i=i 

/ r / /^l In/12 n — 1, 

= « + (n- 1)- + ,,, + In- (n- 1)1- = «(! + - + ... + -)- (- + - + ... + —) 

= nHn — n + if„ = (n + l)il„ — n 
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Thus we get, reusing the result about J2]=i 

n Tj n 

(n + 2) ^ - J] - (n + l)i7„ + n 



(n 



2) J2 - [(^ + ^)Hn -n]-{n + 1)H^ + n 



.=1 ^ 



= (n + 2) V - 2(n + - 1). 

j=i 

To analyse the first sum above, we note (again following |7j here) 

n „ n n 

i=i i=i i=i-^^ 

n-1 



' 7 n + 1 7 n + 1 



J n+ I ^ J n + l - J 
and this gives that 

Iterating this equation, and using Ho = at the end, we obtain the identity 

j=i k=i 

The right-hand side is 

n jj 71+1 jj n+1 jj n+1 „ n+1 _ 

fc=l A:=2 fc=l fc=l fc=l 

n+1 fe ^ n+1 n+1 n+1 »i+l 

k=l j=l j=l k=j k=l j=k 

Again noting that {{j,k) : k<j<n + l,l<k<n + l} gives the same 
values of l/{jk) as {(j, k) : 1 < j < k, 1 < k < n + 1} provided we note that 
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the terms for j = k are repeated, we get 

n+l n+1 ^ n+1 jt+1 

k=l j=k k=l j=l 

~ ^ ^-"n+1 ~ -"n+1 -"n+l' 

Thus, we have, as in the statement of Lemma 3.5, 

n 

J2h,H„^,^, = {n + 2){Hl^, - H^^l) - 2{n + 1){H„.^, - 1). 

i=l 

Also, the following Corollary is obtained, using equations from the last 
Lemma, by Sedgewick 

Corollary 3.6 For n G ISi, it holds 

Proof. From equations (3) and (4), we see that 

i^^+i - i^Si = = + 2^7 

i=i ■' i=i ■' 

2 (2) 2 (2) Ti 

by iteration. • 

We will use the above Lemma and Corollary in our analysis. We have 
that 



n n ^ n n jj 

E H,H.n^,^, = E[^.(^n-. + —-^ :)] = E + E 

i=l i=l 1=1 i=l 

The second sum substituting j = n + 1 — i becomes 

n „ " CJ 

E-"j _ \ ^ J^n+l-j 

n + 1 -i~ j 

i=i ^ j=i 
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+ l-i 



As we have seen this is equal to 

n 

E J^n+l-j _ „2 _ tt{2) 
— -"n+1 -"n+1 

3=1 

Hence, by Lemma 3.5 

Ti 

HiH^_, = (n + 2)(//^+i - i/Si) - 2(n + l)(//n+i - 1) - - /^Si) 

= (n + - - 2(i^n+i - 1)]. 

Using the above equation and the result obtained in page 13, just before 
Lemma 3.5, we deduce that 

J2jHj-A+,-j = + iml^, - H%) - 2(//„+i - 1)] 

Having worked out all the expressions involved in the following relation, 
we can now finish off: 

n n ^ 

(n + - nB^ = 4{J2jHj-i + J^^^j-^^n-j+i) + 2S„ - 2n{n + + -n{n + 11). 

We have 

{n + l)-B„+i - nS„ 

^ ,[ ^(^ + 1)/^^.. _ n(n + 5) ^ + 2^ ^^^.^^ _ ^.^^^ _ ^^^^^^ _ ^^^^ 
+ 2S„ - 2n{n + l)//„+i + + 11) 



= 2n{n + 1)H^+, - n{n + 5) + 2(n + l)(n + 2){Hl^^ - H^^l^) 

- 4(n + l)(n + 2) - 1) + 2S„ - 2n{n + l)i/„+i + + 11) 

- 2(n + l)(n + 2)(i/^^i - i^i5l) - 4(n + l)(n + 2)(i^,+i - 1) - I^il^ + 2S„. 
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Then, 

= 2{n + 2){Hl^, - H^:i) - 4(n + 2)(i/„+i - 1) - + 

Bn+l _ -Sn 0.^-2 u-(2) X x ^(^ " 1) 
\- ^[^n+l - -^n+l) - - i-) 



n + 2 n + 1 ' ' ' 2(n + l)(n + 2)' 

The last equation is equivalent to 

n+1 n ^" ^ 2n(n+l) 

Iterating the recurrence relation, we obtain 

n+1 1 ' ' ^ ^ ^ 2i i + l 

1=1 1=1 i=l ^ ' 

Since = 0, it is 

n + 1 * ' ^ Z^v « ) 2i i + l 

1=1 1=1 1=1 ^ ' 

The first sum, by Corollary 3.6 is equal to 

n 

= n(//:;-//«)-2x:-^ 



i=l 



i=l 



1=1 1=1 



= (n+l)(i/^-i/(2))-2j]i/, 

i=l 

^{n + l){Hl-Hi^^) + 2n-2nH^. 



Note that on the third line we add and subtract simultaneously {H^^ — Hn ''), 
which is equal to 2 Yli=i Hi/i + 1 by Corollary 3.6. Doing so, the fraction is 
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cancelled and the corresponding sum can be easily computed. Hence 

= 2{n + l){Hl - //f ) + 4n - AnH^ - 4[(n + - 2n] - - t^) 

i=l 

i=l 1=1 

= 2(n + - //f ) + 12n - 8ni/„ - 4//„ - | - + 3//„+i - 3 

= 2{n + l){Hl-H!^^) + l2n-MHn-mn-'^-H^ + ^Hn+ ^ 



/t I HI 

n + 1 

= 2(n + - iff ) - i7„(8n + 2) + ^ + -1^ - 3 

= 2(n + l){Hl - i/f ) - H^i^n + 2) + 

Finally, multiplying both sides by n + 1 we obtain 

= 2{n + lf{Hl - Hi'^) - H^{n + l)(8n + 2) + ^^!!!±_^. 

Now, the Proof of Theorem 2.3 is complete. Consequently, the Variance of 
the number of pairwise comparisons C„ of Randomised Quicksort is equal to 

Tn^ - 4(n + ifH^^^^ - 2{n + + 13n. 
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